10-315 Introduction to Machine Learning (SCS Majors)
Lecture 1: Introduction
Lecture 1: Introduction
Leila Wehbe
Carnegie Mellon University
Machine Learning Department
Carnegie Mellon University
Machine Learning Department
Lecture based on chapter 4 from Hal Daumé III, on Kilian Weinberger's lecture 3, on Tom Mitchell's lecture 1 and Matt Gormley's lecture 1.
Lecture outcomes¶
- Core concepts and problem definitions in Machine Learning
- Overview of applications
Links (use the version you need)¶
Welcome to 10-315 Intro to Machine Learning¶
Welcome to 10-315 Intro to Machine Learning¶
Welcome to 10-315 Intro to Machine Learning¶
Schedule:¶
Exam 1: October 9th
Exam 2: December 2nd (non-comprehensive)
Lectures on Monday and Wednesday, Recitation most Fridays.
- EXCEPTION: no lecture on Labor Day (September 2).
- EXCEPTION: no lecture on September 9th, instead we will have lecture that week on Wed and Fri (11th and 13th).
Homework 0 out this Thursday.
Highlights of Course Logistics¶
- 7 HW assignments (60%)
- 2 exams (see schedule, 20% each)
- Homework assignments will be submitted on gradescope.
- 8 late days in total, maximum 3 per assignment.
- Collaboration is ok if you only talk to each other, and then write / implement separately.
- Collaboration should be disclosed. There is a dedicated section for each homework assignment.
- What happens if you disclose / don't disclose?
- What happens if you copy code from someone else (even if you change it)?
- What happens if you use generative AI to create your code?
AIV risk due to generative AI use¶
Nichelle has put together a list of common reasons students turned to ChatGPT for help and alternative actions which are less likely to raise flags. It is our hope that you learn from this collective experience and complete this course responsibly.
- Reason: I used ChatGPT because I was in a time crunch.
- Alternative: This is the most common reason students turn to ChatGPT. We suggest starting your assignments as early as possible. You will not know how much time an assignment will take to complete until you attempt it. If you find that will not meet the deadline due to an emergency, reach out to the course EA before the assignment is due or as soon as you are able.
AIV risk due to generative AI use¶
Reason: I used ChatGPT to look up numpy functions.
Alternative: We suggest you use numpy.org. Again, if you still intend to use ChatGPT, which is not recommended, be sure to prepend your prompt with “don’t give me any code in any language”. Reminder: this advice is not a guarantee that you will not be flagged for a potential AIV.
Reason: I used ChatGPT to debug my code.
Alternative: Bring detailed pseudo code to office hours that describes your implementation design. If you do not have pseudo code, the TA will not look at your code, but instead ask you to sketch out pseudo code at the chalkboard and discuss it from there. After discussing at a high-level if your 10 minutes have not expired, the TA may have time to look at your code. Reminder: This is not a programming course; you are expected to know how to debug code. Giving your code to ChatGPT will result in an AIV.
What is machine learning?¶
"How can we build computer programs that automatically improve their performance through experience?"
- Study of algorithms that
- improve their performance P
- at some task T
- with experience E
- well-defined learning task: (P,T,E)
- Study of algorithms that
How can we learn from data?
How robust is what we learn? What types of assumptions do we make with different approaches? What are the guarantees? How do we pick an approach?
Natural language processing¶
Computer Vision¶
Speech recognition¶
Robots¶
Games and reasoning¶
Protein folding¶
The Key: Machine Learning¶
Skin Cancer Diagnosis¶
Predict Cardiovascular Risk from Retinal Photographs¶
Machine Learning Theory¶
Social impacts of Machine Learning¶
- Better, evidence-based, decision making in many domains
- Medical diagnosis, Credit card fraud detection, Online tutoring, Anticipating equipment failures, Marketing, Legal sentencing, …
- Created breakthroughs in AI, with huge impact on society
- Computer vision, speech, text processing, self-driving cars, games, …
- Raises new issues
- Explainability
- Bias
- Privacy
- If big data is key to successful ML, who controls access to the data?
- …
We will cover in this course¶
Machine learning algorithms¶
- Supervised learning
- Classification
- Regression
Supervised Learning Problem Statement¶
The goal is to learn a function $c^*$ that maps input variables $X$ to output variables $y$, based on a set of labeled training examples.
- classification: $y$ is binary or multiclass
- regression: $y$ is continuous
Training Data: Given a training set of $n$ labeled examples:$\{(X_1, y_1), (X_2, y_2), \dots, (X_n, y_n)\},$ where $X_i \in \mathcal{X} $ represents the input features and $y_i \in \mathcal{Y} $ represents the corresponding labels, the goal is to estimate the optimal function $c^*$ that best predicts the labels for new, unseen data.
Hypothesis Space: The function $c^*$ is chosen from a family of hypotheses $ \mathcal{H} $. That is, $ c^* \in \mathcal{H} $, where $ \mathcal{H}$ represents the set of all possible functions that could map inputs to outputs.
Learning Rule: A learning rule is applied to select the optimal function $c^* $ from the hypothesis space $\mathcal{H}$. The learning rule is typically defined based on an optimization algorithm that seeks to minimize a cost function over the training data.
Supervised Learning Problem Statement (continued)¶
Loss Function: A loss function $L(y, \hat{y})$ quantifies the error between the predicted value $\hat{y} = c(X)$ and the actual value $y$ for a single data point.
- Examples of Loss Functions
- 0-1 Loss (for Classification): The 0-1 loss function is used in classification tasks and is defined for a single point as: $$ L(y, \hat{y}) = \begin{cases} 0, & \text{if } y = \hat{y} \\ 1, & \text{if } y \neq \hat{y} \end{cases} $$
- The loss over a dataset (also refered to as the error rate): $\frac{1}{n} \sum_{i=1}^n L(y_i, \hat{y}_i)$
Mean Squared Error (MSE for Regression): The MSE loss function is commonly used in regression tasks and is defined for a single point as:
$$ L(y, \hat{y}) = (y - \hat{y})^2 $$
- The MSE is the average of squared differences over all training examples: $\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$
Test Data: A set of $m$ labeled examples:$\{(X_j, y_j), j\in 1 \dots m\},$ which is sampled from the same distribution as the training set.
Example Classifier¶
Given the dataset:
- Compute the majority vote classifier, what is the training error rate?
- Decision stump on Family History
Machine learning algorithms¶
- Supervised learning
- Classification
- Regression
- Unsupervised learning
- Specific case: self-supervised learning
Self-Supervised Learning Problem Statement¶
The model learns to predict part of the input data from other parts of the input data.
Training Data: Given a set of unlabeled data points:$\{X_1, X_2, \dots, X_n\},$ where $X_i \in \mathcal{X}$ represents the input features, the model is trained to predict some aspect of $X$ from other aspects of $X$. Unlike supervised learning, there are no explicit labels $y_i$ provided by humans.
Examples:
- Language modeling: given words in a sequence, predict the next word.
- Image inpainting: fill-in the missing parts of an image.
- Contrastive learning: Learn to distinguish between different transformations or augmentations of the same data point $X_i$. The task is to bring representations of similar pairs (e.g., different augmentations of the same image) closer in the feature space while pushing representations of dissimilar pairs apart.
Machine learning algorithms¶
- Supervised learning
- Classification
- Regression
- Unsupervised learning
- Specific case: self-supervised learning
- Reinforcement learning
Reinforcement Learning Problem Statement¶
RL is a type of machine learning where an agent learns to make decisions by interacting with an environment. The goal is for the agent to learn a policy that maximizes cumulative rewards over time.
Agent: the learner or decision-maker that interacts with the environment. At each time step, the agent observes the current state $s_t$, takes an action $a_t$, and receives a reward $r_t$ from the environment.
Policy: A policy $\pi(a|s)$ is a mapping from states to probabilities of selecting each possible action. The goal of reinforcement learning is to find an optimal policy $\pi^*$ that maximizes the expected cumulative reward over time.
$$ \pi^* = \arg\max_{\pi} \mathbb{E} \left[ \sum_{t=0}^{\infty} \gamma^t r_t \right] $$
where $\gamma \in [0,1]$ is the discount factor that balances immediate and future rewards.
Learning algorithm: How to learn the policy? We will explore multiple approches later in the course.
Class outline:¶
- Supervised learning:
- Perceptron
The Perceptron¶
- Introduced by Rosenblatt in 1958
- Inspired by real neurons